Preparing the dataset

Libraries used

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(purrr)

Reading in data and exclusions

data <- read.csv("data/main_data.csv")

#change name of row number identifier

data <- data %>% 
  mutate(id = row_number())

#removing missing data rows from dataset (12 in total) for participants who did not complete the survey

rows_with_blank_in_name <- grepl("^\\s*$", data$EPE_5)
df_with_blanks_in_name <- data[rows_with_blank_in_name, ]

data <- data[!rows_with_blank_in_name, ]

#removing row numbers from the dataset where both attention checks were failed (the process that identified these row numbers can be viewed in 'data_quality_check.Rmd')

data <- data %>%
  filter(!(row_number() %in% c(420, 948, 1288, 1315)))

#removing exclusions based on half the median time, 360 seconds

data <- data %>%
  filter(Duration..in.seconds. >= 360)

#this should leave a data-frame with 1322 observations
#extract relevant variables into new data frame

data <- data %>%
  select(id, Training.condition, Advert.1, Advert.2, Advert.3, Advert.4, starts_with("PK"), starts_with("agree"), starts_with("informed"), starts_with("accurate"), starts_with("believable"), starts_with("trustworthy"), starts_with("factual"), election_reg, recall_num, recall_name, starts_with("useful"), reg_know, starts_with("EPE"), starts_with("general_confidence"), starts_with("institution"), democracy, political_interest, external_efficacy, internal_efficacy, starts_with("SM"), partyID, age_sample, gender, education)

Agree/disagree item transformations

The code below will convert all variables with response measurement of strongly disagree to strongly agree from a character variable to a numerical scale of 1-7. One item also needs to be reverse scored:

  • Informed item 3: ‘I am not sure who is behind this material’

There are also attention checks in the dataset that need to be removed once exclusions have been dealt with:

  • informed_2_5
  • informed_2imprint_5
  • EPE_5

Below creates a functions that will be applied to all agree-disagree response formats in the dataset - all ones that start with PK, agree, informed, EPE and general_confidence. The second function then reverse scores informed item three across the eight advert variations.

#converting to numeric variables from character for agree - disagree
#Persuasion knowledge measures accidentally has a slightly different response option compared to other measures, meaning 2 conversion functions are needed. Instead of 'somewhat' they got 'slightly'.

convert_numeric1 <- function(response) {
  
  # Trim leading and trailing whitespace and convert to lowercase
  response_cleaned <- tolower(trimws(response))
  
  # Define the mapping with all lowercase keys
  mapping <- c(
    "strongly disagree" = 1,
    "disagree" = 2,
    "slightly disagree" = 3,
    "neither agree nor disagree" = 4,
    "slightly agree" = 5,
    "agree" = 6,
    "strongly agree" = 7
  )
  
  # Return the mapped value, or NA if the response does not match
  return(ifelse(!is.na(mapping[response_cleaned]), mapping[response_cleaned], NA))
}

convert_numeric2 <- function(response) {
  
  # Trim leading and trailing whitespace and convert to lowercase
  response_cleaned <- tolower(trimws(response))
  
  # Define the mapping with all lowercase keys
  mapping <- c(
    "strongly disagree" = 1,
    "disagree" = 2,
    "somewhat disagree" = 3,
    "neither agree nor disagree" = 4,
    "somewhat agree" = 5,
    "agree" = 6,
    "strongly agree" = 7
  )
  
  # Return the mapped value, or NA if the response does not match
  return(ifelse(!is.na(mapping[response_cleaned]), mapping[response_cleaned], NA))
}

#applying this function to the data frame (two separate functions to account for differences in response options)
         
data <- data %>%
  mutate(across(starts_with("PK"), convert_numeric1))

data <- data %>%
  mutate(across(c(starts_with("informed"), starts_with("agree"), starts_with("EPE"), starts_with("general")), ~convert_numeric2(.x)))

#reverse scoring informed item 3

reverse_code <- function(response) {
  # Define the mapping from original to reversed scores
  mapping <- c(1, 2, 3, 4, 5, 6, 7)
  names(mapping) <- c(7, 6, 5, 4, 3, 2, 1)
  
  # Use the response as a name to look up in the mapping
  return(as.numeric(names(mapping)[match(response, mapping)]))
}

data <- data %>%
  mutate(across(c(informed_1_3, informed_1imprint_3, informed_2_3, informed_2imprint_3, informed_3_3, informed_3imprint_3, informed_4_3, informed_4imprint_3), ~reverse_code(.x)))

#removing the attention check columns from the dataset

data <- data %>%
  select(-informed_2_5, -informed_2imprint_5, -EPE_5)

Variable tranformations for both RM and IM dataframes

The code below conducts the following transformations to the variables that will be present in both the repeated measures and independent measures data frames so they are ready to be analysed:

  • Transformed to a factor: advert.1, advert.2, advert.3, advert.4, Training.condition, reg_know, SM_use, starts with: SM_frequency, party_ID, gender, education

  • Transformed to a numerical variable: election_reg, starts with: useful_rank, starts with: institution_trust, democracy, political_interest, external_efficacy, internal_efficacy, age

Some variables will only be present in the repeated measures data frame and will be created later.

#creating factor variables through use of a function

convert_to_factor <- function(df, cols) {
  df %>%
    mutate(across(all_of(cols), as.factor))
}

data <- data %>%
  convert_to_factor(c("Advert.1", "Advert.2", "Advert.3", "Advert.4", "SM_frequency_1", "SM_use", "Training.condition", "reg_know", "SM_use", "partyID", "gender", "education"))

#Setting reference groups for: reg_know, SM_use, SM_frequency, gender, education

#regulation knowledge

reg_response_order <- c("There are no regulatory controls on any type of political advertising during UK elections", "All political advertising is regulated by rules set by the UK government, but there is one set of rules for advertising on television and radio and a different set of rules for advertising on the internet and social media", "All political advertising (whether on television, radio, in newspapers or the internet) is subject to the same rules set by the UK government", "Not sure")

data <- data %>%
  mutate(across(reg_know, ~factor(.x, levels = reg_response_order)))

#Social media use
  
use_response_order <- c("None, No time at all ", "Less than 1/2 hour ", "1/2 hour to 1 hour ", "1 to 2 hours ",  "Not sure")

data <- data %>%
  mutate(across(SM_use, ~factor(.x, levels = use_response_order)))
  
#SM frequency use
  
freq_response_order <- c("Never",
                         "Less than once a week",
                         "Once a week\t",
                         "Once every couple of days\t",
                         "Once a day\t",
                         "2-5 times a day",
                         "More than five times a day\t")

data <- data %>%
  mutate(across(SM_frequency_1, ~factor(.x, levels = freq_response_order)))

#gender, female as reference

gender_response_order <- c("Female", "Male", "Non-binary / third gender", "Prefer not to say")

data <- data %>%
  mutate(across(gender, ~factor(.x, levels = gender_response_order)))

#Education level, postgrad as reference

ed_response_order <- c("Postgraduate (e.g. M.Sc, Ph.D)", "Undergraduate University (e.g. BA, B.Sc, B.Ed)", "A-level, or equivalent", "GCSE level, or equivalent", "Other, please specify", "No formal qualifications")

data <- data %>%
  mutate(across(education, ~factor(.x, levels = ed_response_order)))
#Need to first change response options from categories to numbers for: election_reg, institution_trust, democracy, political_interest, internal_efficacy, external_efficacy, age

#Confidence in electoral regulation

data <- data %>%
  mutate(election_reg = case_when(
    election_reg == "Completely insufficient" ~ 1,
    election_reg == "Mostly insufficient" ~ 2,
    election_reg == "Slightly insufficient" ~ 3,
    election_reg == "No opinion/not sure" ~ 4,
    election_reg == "Slightly sufficient" ~ 5,
    election_reg == "Mostly sufficient" ~ 6,
    election_reg == "Completely sufficient" ~ 7
  ))

#Converting 'democracy' to a numeric variable

data <- data %>%
  mutate(democracy = case_when(
    democracy == "Very dissatisfied" ~ 1,
    democracy == "A little dissatisfied" ~ 2,
    democracy == "Fairly satisfied" ~ 3,
    democracy == "Very satisfied" ~ 4
  ))

#converting political interest to a numerical variable

data <- data %>%
  mutate(political_interest = case_when(
    political_interest == "Not at all interested" ~ 1,
    political_interest == "Not very interested" ~ 2,
    political_interest == "Slightly interested" ~ 3,
    political_interest == "Fairly interested" ~ 4,
    political_interest == "Very interested " ~ 5
  ))

#converting internal and external efficacy to numeric, 5 options

data <- data %>%
  mutate(internal_efficacy = case_when(
    internal_efficacy == "Not at all " ~ 1,
    internal_efficacy == "A little " ~ 2,
    internal_efficacy == "A moderate amount  " ~ 3,
    internal_efficacy == "A lot " ~ 4,
    internal_efficacy == "A great deal " ~ 5
  ))

data <- data %>%
  mutate(external_efficacy = case_when(
    external_efficacy == "Not at all " ~ 1,
    external_efficacy == "A little " ~ 2,
    external_efficacy == "A moderate amount  " ~ 3,
    external_efficacy == "A lot " ~ 4,
    external_efficacy == "A great deal " ~ 5
  ))

#creating numeric variables through the use of a function

convert_to_numeric <- function(df, cols) {
  df %>%
    mutate(across(all_of(cols), as.numeric))
}

#age

data$age_sample <- as.numeric(data$age_sample)

#Convert all other variables to numeric

data <- data %>%
  convert_to_numeric(c("useful_rank_1", "useful_rank_2", "useful_rank_3", "useful_rank_4", "useful_rank_5", "useful_rank_6"))

Recall variable transformations

Transformation of recall variables:

  • Recall_num: two new columns need to be created specifying those who picked ‘not sure’ versus those who chose an answer, then those who were correct, chose 2, and those who were incorrect.

  • Recall_name: 8 potential columns will need to be created with a binary response, indicating whether each name option was identified e.g. ‘common sense collective’.

  • The correct identification options are:

    • Common sense collective - advert 1
    • Breaking barriers alliance - advert 2
    • Speak freely Inc.- advert 3
    • Campaign for a better Britain - advert 4
  • Incorrect options

    • Future first
    • The peoples movement
    • Voice for the people
    • Hope something - removed from qualtrics and replaced with ad 4
    • All together
#Recall number transformation for correct/incorrect response

data <- data %>%
  mutate(recall_correct = 
           case_when(
             recall_num == 2 ~ "correct",
             TRUE ~ "incorrect"
           ))

#Recall name transformation, correct responses

data <- data %>%
  mutate(CSC = case_when(
    str_detect(recall_name, "Common Sense Collective") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(BBA = case_when(
    str_detect(recall_name, "Breaking Barriers Alliance") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(SFI = case_when(
    str_detect(recall_name, "Speak Freely Inc") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(CBB = case_when(
    str_detect(recall_name, "Campaign for a better Britain") ~ 1,
    TRUE ~ 0
  ))

#incorrect responses

data <- data %>%
  mutate(FF = case_when(
    str_detect(recall_name, "Future First") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(TPM = case_when(
    str_detect(recall_name, "The People’s movement") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(VFP = case_when(
    str_detect(recall_name, "Voice for the People") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(AT = case_when(
    str_detect(recall_name, "All Together") ~ 1,
    TRUE ~ 0
  ))

#number of correct names recalled, name_correct

data <- data %>%
  mutate(name_correct = CSC + BBA + SFI + CBB)

#number of incorrect names recalled, name_incorrect

#add incorrect columns together

data <- data %>%
  mutate(name_incorrect = FF + TPM + VFP + AT)

#convert campaign names to factors

data <- data %>%
  convert_to_factor(c("recall_correct", "CSC", "BBA", "SFI", "CBB", "FF", "TPM", "VFP", "AT"))

Repeated measures dataframe

The code below turns the wide data into long data, creating 4 rows for each participant and only one column for each of the outcome variables: persuasion knowledge, political goal, informedness, agreement, believability, trustworthiness, accurateness, factual. Extra columns also specify the advert viewed and the version (imprint or no imprint).

#create a new dataframe with only the repeated measures (post-advert) variables

RM <- data %>%
  select(id, starts_with("Advert."), starts_with("PK"), starts_with("agree"), starts_with("informed"), starts_with("accurate"), starts_with("believable"), starts_with("trustworthy"), starts_with("factual"))

#when first converted into long data, eight rows are generated for each participant for the eight different advert variations, but many columns contain NA.

#persuasion knowledge df, each item separate

PK1_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_1, PK_1imprint_1, PK_2_1, PK_2imprint_1, PK_3_1, PK_3imprint_1, PK_4_1, PK_4imprint_1) %>%
  pivot_longer(
    cols = c(PK_1_1, PK_1imprint_1, PK_2_1, PK_2imprint_1, PK_3_1, PK_3imprint_1, PK_4_1, PK_4imprint_1),
    names_to = "PK1",
    values_to = "PK1_value"
  )

PK2_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_2, PK_1imprint_2, PK_2_2, PK_2imprint_2, PK_3_2, PK_3imprint_2, PK_4_2, PK_4imprint_2) %>%
  pivot_longer(
    cols = c(PK_1_2, PK_1imprint_2, PK_2_2, PK_2imprint_2, PK_3_2, PK_3imprint_2, PK_4_2, PK_4imprint_2),
    names_to = "PK2",
    values_to = "PK2_value"
  )

PK3_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_3, PK_1imprint_3, PK_2_3, PK_2imprint_3, PK_3_3, PK_3imprint_3, PK_4_3, PK_4imprint_3) %>%
  pivot_longer(
    cols = c(PK_1_3, PK_1imprint_3, PK_2_3, PK_2imprint_3, PK_3_3, PK_3imprint_3, PK_4_3, PK_4imprint_3),
    names_to = "PK3",
    values_to = "PK3_value"
  )

PK4_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_4, PK_1imprint_4, PK_2_4, PK_2imprint_4, PK_3_4, PK_3imprint_4, PK_4_4, PK_4imprint_4) %>%
  pivot_longer(
    cols = c(PK_1_4, PK_1imprint_4, PK_2_4, PK_2imprint_4, PK_3_4, PK_3imprint_4, PK_4_4, PK_4imprint_4),
    names_to = "PK4",
    values_to = "PK4_value"
  )


#political goal df, informed item 1

PG_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_1, informed_1imprint_1, informed_2_1, informed_2imprint_1, informed_3_1, informed_3imprint_1, informed_4_1, informed_4imprint_1) %>%
  pivot_longer(
    cols = c(informed_1_1, informed_1imprint_1, informed_2_1, informed_2imprint_1, informed_3_1, informed_3imprint_1, informed_4_1, informed_4imprint_1),
    names_to = "political_goal",
    values_to = "PG_value"
  )

#informed df, each item separate

informed2_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_2, informed_1imprint_2, informed_2_2, informed_2imprint_2, informed_3_2, informed_3imprint_2, informed_4_2, informed_4imprint_2) %>%
  pivot_longer(
    cols = c(informed_1_2, informed_1imprint_2, informed_2_2, informed_2imprint_2, informed_3_2, informed_3imprint_2, informed_4_2, informed_4imprint_2),
    names_to = "informed2",
    values_to = "informed2_value"
  )

informed3_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_3, informed_1imprint_3, informed_2_3, informed_2imprint_3, informed_3_3, informed_3imprint_3, informed_4_3, informed_4imprint_3) %>%
  pivot_longer(
    cols = c(informed_1_3, informed_1imprint_3, informed_2_3, informed_2imprint_3, informed_3_3, informed_3imprint_3, informed_4_3, informed_4imprint_3),
    names_to = "informed3",
    values_to = "informed3_value"
  )

informed4_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_4, informed_1imprint_4, informed_2_4, informed_2imprint_4, informed_3_4, informed_3imprint_4, informed_4_4, informed_4imprint_4) %>%
  pivot_longer(
    cols = c(informed_1_4, informed_1imprint_4, informed_2_4, informed_2imprint_4, informed_3_4, informed_3imprint_4, informed_4_4, informed_4imprint_4),
    names_to = "informed4",
    values_to = "informed4_value"
  )

#agreement df

agree_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("agree")) %>%
  pivot_longer(
    cols = starts_with("agree"),
    names_to = "agree",
    values_to = "agree_value"
  )

#trustworthy df

trustworthy_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("trustworthy")) %>%
  pivot_longer(
    cols = starts_with("trustworthy"),
    names_to = "trustworthy",
    values_to = "trustworthy_value"
  )

#believability df

believe_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("believable")) %>%
  pivot_longer(
    cols = starts_with("believable"),
    names_to = "believable",
    values_to = "believable_value"
  )

#accurateness df

accurate_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("accurate")) %>%
  pivot_longer(
    cols = starts_with("accurate"),
    names_to = "accurate",
    values_to = "accurate_value"
  )

#factual df

factual_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("factual")) %>%
  pivot_longer(
    cols = starts_with("factual"),
    names_to = "factual",
    values_to = "factual_value"
  )

#Create two new variables in each indicating advert type and version viewed, so that the dataframes can be merged by these two columns

#Below is three functions that can be applied to each df to create new variables.

# Function to add 'advert' and 'version' based on patterns in a specified column
add_advert_version <- function(data, column_name) {
  data %>%
    mutate(
      advert = case_when(
        str_detect(!!sym(column_name), "1") ~ "advert.1",
        str_detect(!!sym(column_name), "2") ~ "advert.2",
        str_detect(!!sym(column_name), "3") ~ "advert.3",
        str_detect(!!sym(column_name), "4") ~ "advert.4",
        TRUE ~ NA_character_
      ),
      version = case_when(
        str_detect(!!sym(column_name), "imprint") ~ 1,
        TRUE ~ 0
      )
    ) 
}

#apply function for agree, trust, believe, factual, accurate

agree_long <- add_advert_version(agree_long, "agree")
trustworthy_long <- add_advert_version(trustworthy_long, "trustworthy")
believe_long <- add_advert_version(believe_long, "believable")
accurate_long <- add_advert_version(accurate_long, "accurate")
factual_long <- add_advert_version(factual_long, "factual")

#PK function

PK_advert_version <- function(data, column_name) {
  data %>%
    mutate(
      advert = case_when(
        str_detect(!!sym(column_name), "PK_1") ~ "advert.1",
        str_detect(!!sym(column_name), "PK_2") ~ "advert.2",
        str_detect(!!sym(column_name), "PK_3") ~ "advert.3",
        str_detect(!!sym(column_name), "PK_4") ~ "advert.4",
        TRUE ~ NA_character_
      ),
      version = case_when(
        str_detect(!!sym(column_name), "imprint") ~ 1,
        TRUE ~ 0
      )
    ) 
}

PK1_long <- PK_advert_version(PK1_long, "PK1")
PK2_long <- PK_advert_version(PK2_long, "PK2")
PK3_long <- PK_advert_version(PK3_long, "PK3")
PK4_long <- PK_advert_version(PK4_long, "PK4")

#informed function

in_advert_version <- function(data, column_name) {
  data %>%
    mutate(
      advert = case_when(
        str_detect(!!sym(column_name), "informed_1") ~ "advert.1",
        str_detect(!!sym(column_name), "informed_2") ~ "advert.2",
        str_detect(!!sym(column_name), "informed_3") ~ "advert.3",
        str_detect(!!sym(column_name), "informed_4") ~ "advert.4",
        TRUE ~ NA_character_
      ),
      version = case_when(
        str_detect(!!sym(column_name), "imprint") ~ 1,
        TRUE ~ 0
      )
    ) 
}

PG_long <- in_advert_version(PG_long, "political_goal")
informed2_long <- in_advert_version(informed2_long, "informed2")
informed3_long <- in_advert_version(informed3_long, "informed3")
informed4_long <- in_advert_version(informed4_long, "informed4")

#the code below creates a function that filters out redundant rows, leaving 4 for each participant

clean_NA <- function(df) {
  # Identify the column(s) ending with '_value'
  value_cols <- names(df)[grepl("_value$", names(df))]
  
  # Ensure there is at least one column ending with '_value'
  if (length(value_cols) > 0) {
    df <- df %>%
      filter(!is.na(.[[value_cols]])) %>%
      distinct(id, advert, .keep_all = TRUE)
  }
  
  return(df)
}

#apply this function to all dataframes, specified through thier shared name of '_long' at the end of each df

df_names <- ls(pattern = "_long$")
df_list <- mget(df_names, envir = .GlobalEnv)

for (name in names(df_list)) {
  assign(name, clean_NA(get(name)), envir = .GlobalEnv)
}

#merge the dataframes back together by matching advert, participant id and version

rm_list <- list(PK1_long, PK2_long, PK3_long, PK4_long, PG_long, informed2_long, informed3_long, informed4_long, agree_long, trustworthy_long, accurate_long, believe_long, factual_long)

merged_rm <- reduce(rm_list, full_join, by = c("id", "advert", "version", "Advert.1", "Advert.2", "Advert.3", "Advert.4"))

#changing order of columns

merged_rm <- merged_rm %>%
  select(id, Advert.1, Advert.2, Advert.3, Advert.4, advert, version, everything())

#delete the variable columns e.g., 'PK1', 'informed2'

repeated_measures <- merged_rm %>%
  select(-c(PK1, PK2, PK3, PK4, political_goal, informed2, informed3, informed4, agree, trustworthy, believable, accurate, factual))

The code chunk below mean scores the persuasion knowledge items and the informed items. These are not the only scales that will be mean scored, but they are the only mean-scored items in the repeated measures part of the experiment (post-advert questions). Mean scoring of EPE and political trust items occur in a later section.

repeated_measures <- repeated_measures %>%
  rowwise() %>%
  mutate(PK = mean(c(PK1_value, PK2_value, PK3_value, PK4_value)))

repeated_measures <- repeated_measures %>%
  rowwise() %>%
  mutate(informed = mean(c(informed2_value, informed3_value, informed4_value)))

#changing the order of columns

repeated_measures <- repeated_measures %>%
  select(id, Advert.1, Advert.2, Advert.3, Advert.4, advert, version, PK, informed, PG_value, agree_value, trustworthy_value, believable_value, accurate_value, factual_value, everything())

Merged repeated measures data frame

The code below will now merge relevant variables from outside the repeated measures part of the experiment with this dataframe e.g., training condition, demographic variables and recall measures.

Variable descriptions for those with unclear names: - useful_rank_1 = where ‘voters’ were ranked by participants - SM_frequency_1 = how often participants use Facebook

#creating a new df with relevant variables e.g., controls for models

control_measures <- data %>%
  select(id, Training.condition, recall_num, recall_name, recall_correct, CSC, BBA, SFI, CBB, FF, TPM, VFP, AT, reg_know, useful_rank_1, political_interest, SM_use, SM_frequency_1, partyID, age_sample, gender, education)

#matching id number with the repeated measures dataframe so these variables are repeated across rows

imprint_df <- repeated_measures %>%
  left_join(control_measures, by = "id")

#changing the order of columns

imprint_df <- imprint_df %>%
  select(id, Advert.1, Advert.2, Advert.3, Advert.4, Training.condition, advert, version, PK, informed, PG_value, agree_value, trustworthy_value, believable_value, accurate_value, factual_value, recall_num, recall_correct, CSC, BBA, SFI, CBB, FF, TPM, VFP, AT, political_interest, reg_know, SM_use, SM_frequency_1, partyID, age_sample, gender, education, everything())

The code below conducts the following transformations to the variables so they are ready to be analysed:

  • Transformed to a factor: version, advert
  • Transformed to a numerical variable: PG_value, agree_value, trustworthy_value, believe_value, accurate_value, factual_value
#functions created in earlier section

imprint_df <- imprint_df %>%
  convert_to_factor(c("version", "advert"))

imprint_df <- imprint_df %>%
  convert_to_numeric(c("PG_value", "agree_value", "trustworthy_value", "believable_value", "accurate_value", "factual_value"))

Independent measures data frame

Another aspect of the analysis will only require one row per participant, such as when testing the effect of the training condition on various outcomes e.g., confidence in regulation or epistemic political efficacy.

training_df <- data %>%
  select(id, Training.condition, Advert.1, Advert.2, Advert.3, Advert.4, election_reg, recall_num, recall_correct, name_correct, name_incorrect, CSC, BBA, SFI, CBB, FF, TPM, VFP, AT, starts_with("useful_rank"), reg_know, starts_with("EPE"), starts_with("general_confidence"), starts_with("institution_trust"), democracy, political_interest, external_efficacy, internal_efficacy, SM_use, starts_with("SM_frequency"), partyID, age_sample, gender, education)

#Mean scoring EPE

training_df <- training_df %>%
  rowwise() %>%
  mutate(EPE_mean = mean(c(EPE_1, EPE_2, EPE_3, EPE_4)))

#Mean scoring trust, mistrust and cynicism

training_df <- training_df %>%
  rowwise() %>%
  mutate(political_trust = mean(c(general_confidence_1, general_confidence_2, general_confidence_3)))

training_df <- training_df %>%
  rowwise() %>%
  mutate(political_mistrust = mean(c(general_confidence_4, general_confidence_5, general_confidence_6)))

training_df <- training_df %>%
  rowwise() %>%
  mutate(political_cynicism = mean(c(general_confidence_7, general_confidence_8, general_confidence_9)))

Cleaning up the R environment

rm(list=setdiff(ls(), c("data", "imprint_df", "training_df")))

This document relies on the correct data frames having been formed from the code above which can be viewed under ‘details’. This information is also stored in a seperate R Markdown document ‘datawrangling_code.Rmd’. The correct data frames used in the following analyses are called: imprint_df and training_df. Some hypotheses are tested using the former dataframe, which includes four rows for each participant to capture the repeated measures part of the experiment. Some hypotheses are tested using the latter dataframe, which includes only one row per participant.

Research questions

Research theme 1: the effect of viewing a digital imprint on subsequent evaluations

Research theme 2: the effect of being informed about the purpose of digital imprints on subsequent evaluations

Hypotheses

Research question 1:

  • H1a: Digital imprints will increase respondents knowledge about the source of a piece of digital campaign material, with regards to the campaigners’ political and persuasive intent.
  • H1b: The presence of a digital imprint will not increase respondent’s memory of the names of campaigners whose post they viewed.
  • H1c: The presence of a digital imprint will not increase respondent’s perception that they are more informed about the source of campaign material.

Research question 2:

  • H2a: Those who are informed about the purpose of digital imprints will be more likely to correctly recall the names of the campaigners.
  • H2b: Those who are informed about the purpose of digital imprints will perceive themselves as informed about the source of a piece of material if and only if a digital imprint is present.

Research question 3:

  • H3: Those who are informed about the purpose of digital imprints will perceive campaign content as more trustworthy if and only if a digital imprint is present with the content.

Research question 4:

  • H4: Those who are informed about the purpose of an imprint are more likely to perceive campaign laws as sufficient compared to those who are not informed about the purpose.

R packages: visualisation and analysis

library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
library(Matrix)
library(sjPlot)
library(ggplot2)
library(ggeffects)
library(performance)
library(see)
library(patchwork)
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(broom)
library(htmltools)
library(rlang)
## 
## Attaching package: 'rlang'
## The following objects are masked from 'package:purrr':
## 
##     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
##     flatten_raw, invoke, splice
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(lattice)
library(afex)
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
## 
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
## 
##     lmer

Above are the R packages used for analysis and visualisation.

Scale reliability

Below shows the cronbach alpha scores for the mean-averaged scale items in the dataset:

  • Persuasion knowledge
  • Percieved informedness
Cronbach’s Alpha for Scales
Scale Alpha
Persuasion knowledge 0.69
Perceived informedness 0.88

Sample

A representative sample option was used, matching UK census data for age, gender and ethnicity. The following table shows the make up of the sample for these demographics as well as education and political party identification, after exclusions.

Univariate statistics

Below provides descriptive statistics for the key predictor and outcome variables. The first table lists the continuous measures in the repeated measures part of the experiment, and the second shows confidence in regulation, which was only measured once. As can be seen, perceived political goal was heavily skewed. This suggested the political nature of each post was easy for participants to infer.

Summary Statistics for repeated measures variables
Variable Mean Median SD Min Max Q1 Q3
Political goal 6.03 6.00 1.13 1 7 6 7.00
Persuasion knowledge 4.85 4.75 1.07 1 7 4 5.75
Perceived informedness 4.21 4.33 1.57 1 7 3 5.67
Agreement 4.54 5.00 1.52 1 7 4 6.00
Trustworthiness 3.68 4.00 1.49 1 7 3 5.00
Believability 4.50 5.00 1.55 1 7 4 6.00
Factualness 3.26 3.00 1.80 1 7 2 5.00
Accuracy 4.21 4.00 1.46 1 7 3 5.00
Summary Statistics for independent measure variables
Variable Mean Median SD Min Max Q1 Q3
Confidence in regulation 2.95 3 1.43 1 7 2 4

Below shows this information split by the two conditions: training and version viewed.

Below then shows the distribution of each variable as a histogram.

PG Value Histogram
PK Histogram
Informed Histogram
Agree Value Histogram
Trustworthy Value Histogram
Believable Value Histogram
Factual Value Histogram
Accurate Value Histogram
Election Regulation Histogram

Below shows the percentage recall for each of the campaigner names. As can be seen, there is variation between the names, with ‘Speak Freely Inc’ resulting in the lowest recall, and ‘Campaign for a Better Britain’ the highest. This suggests some names were either more memorable than others, or were more visually obvious on the page. This variation allows us to investigate if digital imprints consistently improve recall regardless of these overall differences in recall between names. This helps uncover the effectiveness of digital imprints across different formats in increasing citizen awareness of which campaigners are potentially targeting them during an election. Assessing each advert separately, which will be included as a supplementary analysis to hypothesis 2a, helps decipher how obvious digital imprints need to be to be effective in increasing recall.

Percentage of Participants Who Recalled Each Name
Name Recall % No recall %
Common Sense Collective 63.09 36.91
Breaking Barriers Alliance 46.22 53.78
Speak Freely Inc 37.67 62.33
Campaign for a Better Britain 71.41 28.59
Total 54.60 45.40

Below shows how regulation knowledge was distributed across the whole sample, with the third option being the correct answer. This measure is not included in the pre-registered analysis, but is helpful to get a sense of how knowledgeable the sample was regarding regulatory law in the UK. Training and no training condition are shown separately. It can be seen, the training did not appear to impact the frequency of responses, with the highest count of participant in each condition identifying the correct response option across both conditions.

Hypothesis 1a, outcome: political goal

Did the presence of a digital imprint with a piece of campaign material increase participant awareness that the material had a political goal?

The following models include random intercepts for participant ID and advert. This is because it is expected individual participants will evaluate the adverts from different baselines. Fitting a model that accounts for this variance in baseline assessments among participants recognises the individual differences in perception and evaluation that occur in response patterns. Despite these baseline differences however, it is assumed that how participants assess the outcomes across the adverts will remain consistent.

Random intercepts are also incorporated for the adverts themselves, acknowledging that each advert might also elicit different baseline levels of political goal recognition and persuasion knowledge due to their content or nature. Capturing this variability reflects the reality that some adverts, by their design, are more politically charged or persuasive than others, thereby starting from different evaluative baselines.

The choice to prioritise random intercepts but not random slopes in this model is driven by a key assumption: while there is expected variability in baseline evaluations (both at the level of individual participants and individual adverts), the influence exerted by the presence of digital imprints on the outcome measures is expected to be uniform across participants and adverts.

  • Outcome: PG_value, numerical 1-7
  • Predictor: version, binary factor
  • Random effects: id and advert

Model 1 (includes original and updated model)

Below shows the skewed nature of the outcome variable, as well as the association seperately for each advert.

## `geom_smooth()` using formula = 'y ~ x'

Model outcomes: table

  Perceived Political Nature
Predictors Estimates CI p
Intercept 6.01 5.52 – 6.51 <0.001
Digital Imprint included 0.05 -0.00 – 0.09 0.063
Training Condition -0.01 -0.09 – 0.06 0.780
Random Effects
σ2 0.78
τ00 id 0.30
τ00 advert 0.25
ICC 0.41
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.000 / 0.411

Plotting: raw data

Plotting: model predictions

Model assumptions

The assumptions checked below are as follows:

  • normality of residuals
  • variance of residuals
  • normality of residuals for each of the random effects (id and advert)

As can be seen, as the political goal variable is so heavily skewed, the assumptions for this model are not met. This is likely because the political nature of the posts could easy be inferred from the context of the post, not requiring a digital imprint to alter participants to this. Even though the validity of the predictions are called into question, it can be likely still be concluded that the political nature of the post was easy for participants to infer, making it unlikely a digital imprint would alter this perception in this context.

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.93935, p-value < 2.2e-16
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.89179, p-value = 0.3914

Hypothesis 1a, outcome: persuasion knowledge

Did the presence of a digital imprint with a piece of campaign material increase participant awareness that the material was trying to persuade them of a certain viewpoint?

  • Outcome: PK, numerical 1-7
  • Predictor: version, binary factor
  • Random effects: id and advert

Model

#ainform_pk <- lmer(PK ~ version + (1|id) + (1|advert), data = imprint_df)

#issue with convergence for model below, used apex package to investigate, bobyqa was able to converge

ainform_pk1 <- lmer(PK ~ version + Training.condition + (1 | id) + (1|advert), data = imprint_df, control = lmerControl(optimizer = "bobyqa"))

#random slopes for version and advert

ainform_pk2 <- lmer(PK ~ version + Training.condition + (1| id) + (1 + version|advert), data = imprint_df)

# advert:version intercept vary

ainform_pk3 <- lmer(PK ~ version + Training.condition + (1 | id) + (1|advert:version), data = imprint_df)

# fixed effect interaction version and training

ainform_pk4 <- lmer(PK ~ version*Training.condition + (1 | id) + (1|advert), data = imprint_df)


AIC(ainform_pk1, ainform_pk2, ainform_pk3, ainform_pk4)
##             df      AIC
## ainform_pk1  6 13985.09
## ainform_pk2  8 13988.71
## ainform_pk3  6 13997.46
## ainform_pk4  7 13990.99
#compare models to see if fit it improved

# checking against a reduced model

mixed(PK ~ version + Training.condition + (1 | id) + (1|advert), data = imprint_df, control = lmerControl(optimizer = "bobyqa"), method = 'LRT')
## Contrasts set to contr.sum for the following variables: version, Training.condition, advert
## REML argument to lmer() set to FALSE for method = 'PB' or 'LRT'
## Mixed Model Anova Table (Type 3 tests, LRT-method)
## 
## Model: PK ~ version + Training.condition + (1 | id) + (1 | advert)
## Data: imprint_df
## Df full model: 6
##               Effect df     Chisq p.value
## 1            version  1 22.25 ***   <.001
## 2 Training.condition  1 14.17 ***   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Model outcomes: table

  Persuasion knowledge
Predictors Estimates CI p
Intercept 4.73 4.27 – 5.19 <0.001
Digital Imprint included 0.10 0.06 – 0.14 <0.001
Training Condition 0.15 0.07 – 0.23 <0.001
Random Effects
σ2 0.60
τ00 id 0.37
τ00 advert 0.21
ICC 0.49
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.007 / 0.497

Plotting: raw data

Plotting: model predictions

Model assumptions

The assumptions checked below are as follows:

  • normality of residuals
  • variance of residuals
  • normality of residuals for each of the random effects (id and advert)

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.99409, p-value = 4.23e-05
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.88433, p-value = 0.3575

Hypothesis 1b, outcome: recall of names

Did viewing a digital imprint with a piece of campaign material increase participants memory of the campaigner name?

To test this, recall of the campaigner name can be tested with a logistical regression model to see if viewing the imprint boosted recall of the name. This also uses a newly created dataframe: recall_transform.

The first model tests a simple logistic regression, however the presence of nested data, particularly for advert-level variations, is accounted for in a second mixed-effects model which introduces random intercepts for the four adverts, to account for how some names were more memorable than others. As can be seen by comparison of the two models, the mixed model is a slightly better fit for the data (assessment can be viewed under ‘assumptions’), and the effect remains significant.

Correct names:

  • Advert 1: Common sense collective: CSC
  • Advert 2: Breaking barriers alliance: BBA
  • Advert 3: Speak freely inc: SFI
  • Advert 4: Campaign for a better Britain: CBB

Model

  • Outcome: recall, binary factor
  • Predictor: version, binary factor
## Contrasts set to contr.sum for the following variables: recall, version, Training.condition, advert
## Mixed Model Anova Table (Type 3 tests, LRT-method)
## 
## Model: recall ~ version * Training.condition + (1 | id) + (1 | advert)
## Data: recall_df
## Df full model: 6
##                       Effect df     Chisq p.value
## 1                    version  1 26.50 ***   <.001
## 2         Training.condition  1   8.96 **    .003
## 3 version:Training.condition  1    4.51 *    .034
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Table

Plotting: raw data

Plotting: model predictions

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Accounting for the nested data

Testing the 4 different model variations to see which has the lowest AIC. The model with the lowest AIC was one that included the advert as a fixed effect, rather than a random intercept.

## boundary (singular) fit: see help('isSingular')
##                df      AIC
## initial         5 6835.617
## interaction    10 6816.901
## material_fixed  7 6814.925

Assumptions and comparative model fit

The graphs below check for normality in the residuals of the model. A straight line suggests normality. As can be seen, in the random effects model (right) there is a heavier tail towards the middle, suggesting some variability in the data is not captured by the model. These are not strict assumptions that need to be met in a model that uses a binomial distribution, but are included to provide a full picture of the models fit.

Simple residual plot

Supplementary analysis: advert level variations

Below explores if the effect of the imprint on persuasion knowledge is impacted by the advert type.

model <- lmer(PK ~ version * advert + (1 | id), data = imprint_df)

tab_model(model)
  PK
Predictors Estimates CI p
(Intercept) 5.41 5.34 – 5.48 <0.001
version [1] 0.09 -0.01 – 0.18 0.064
advert [advert.2] -0.51 -0.60 – -0.41 <0.001
advert [advert.3] -0.98 -1.07 – -0.89 <0.001
advert [advert.4] -0.95 -1.04 – -0.86 <0.001
version [1] × advert
[advert.2]
0.06 -0.09 – 0.21 0.423
version [1] × advert
[advert.3]
0.05 -0.08 – 0.19 0.436
version [1] × advert
[advert.4]
-0.07 -0.20 – 0.07 0.316
Random Effects
σ2 0.60
τ00 id 0.37
ICC 0.38
N id 1322
Observations 5288
Marginal R2 / Conditional R2 0.145 / 0.474
# Get predicted values
predicted_PK <- predict(model, newdata = imprint_df)

# Add predicted values to the data frame
imprint_df1 <- cbind(imprint_df, predicted_PK)

with(imprint_df1, interaction.plot(advert, version, predicted_PK,
                                  type = "b", pch = c(1, 16), 
                                  col = c("red", "blue"), 
                                  main = "Interaction Plot", 
                                  xlab = "Advert", 
                                  ylab = "Predicted PK"))

Did viewing a digital imprint with a piece of campaign material increase participants memory of the campaigner name, and how was the impacted by the aesthetic specifics of the advert?

To test this, each correct campaign name can be tested one by one with a logistical regression model to see if viewing the imprint boosted recall of the name. For this measure, there will therefore be 4 models and corresponding visualisations. This also uses the independent measures data frame with only one row per participant: training_df.

Correct names:

  • Advert 1: Common sense collective: CSC
  • Advert 2: Breaking barriers alliance: BBA
  • Advert 3: Speak freely inc: SFI
  • Advert 4: Campaign for a better Britain: CBB

This analysis also will tell us something about how the imprint impacted recall for each advert differently. If imprints only increase recall on some adverts and not others, this may be related to the formatting and aesthetic features of the post itself e.g., how obvious the campaign name was, or even how memorable the name was. To find evidence that supports digital imprints consistently increase recall, regardless of the formatting of the imprint, we should expect to consistently see higher recall across four campaign group names when an imprint is present.

Logs odds from the default model are converted to an odds ratio for easier interpretation. These are then presented as a table with the output of the regression. To understand the direction of the odds ratio, check the original log odds coefficient.

  CSC
Predictors Odds Ratios CI p
(Intercept) 1.58 1.35 – 1.85 <0.001
Advert 1 [1] 1.18 0.94 – 1.47 0.154
Observations 1322
R2 Tjur 0.002
AIC 1743.022
Recall of Common Sense Collective campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) 0.46 0.08 1.58 1.35 1.85
Imprint viewed with material 0.16 0.11 1.18 0.94 1.47
  BBA
Predictors Odds Ratios CI p
(Intercept) 0.69 0.59 – 0.80 <0.001
Advert 2 [1] 1.55 1.24 – 1.92 <0.001
Observations 1322
R2 Tjur 0.012
AIC 1813.606
Recall of Breaking Barriers Alliance campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) -0.37 0.08 0.69 0.59 0.80
Imprint viewed with material 0.44 0.11 1.55 1.24 1.92
  SFI
Predictors Odds Ratios CI p
(Intercept) 0.51 0.43 – 0.60 <0.001
Advert 3 [1] 1.39 1.11 – 1.74 0.004
Observations 1322
R2 Tjur 0.006
AIC 1747.146
Recall of Speak Freely Inc campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) -0.67 0.08 0.51 0.43 0.60
Imprint viewed with material 0.33 0.11 1.39 1.11 1.74
  CBB
Predictors Odds Ratios CI p
(Intercept) 2.28 1.94 – 2.70 <0.001
Advert 4 [1] 1.20 0.95 – 1.52 0.135
Observations 1322
R2 Tjur 0.002
AIC 1584.110
Recall of common sense collective campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) 0.83 0.08 2.28 1.94 2.70
Imprint viewed with material 0.18 0.12 1.20 0.95 1.52

Hypotheses 1c and 2b: outcome: perceived informedness

Did the presence of a digital imprint with a piece of campaign material increase participant’s perception that they had been informed about the source of the content?

  • Outcome: informed, numerical 1-7
  • Predictor: version, binary factor
  • Random effects: id and advert

Model

## boundary (singular) fit: see help('isSingular')
##            df      AIC
## informed_1  7 19075.42
## informed_2  9 19074.43
## informed_3  7 19081.48

Model outcomes: table

## Length of `pred.labels` does not equal number of predictors, no labelling applied.
  Perceived informedness
Predictors Estimates CI p
(Intercept) 4.08 3.71 – 4.46 <0.001
version1 0.21 0.11 – 0.31 <0.001
Training.condition1 -0.07 -0.20 – 0.06 0.296
version1:Training.condition1 0.22 0.08 – 0.36 0.002
Random Effects
σ2 1.71
τ00 id 0.62
τ00 advert 0.14
ICC 0.31
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.011 / 0.313

Plotting: raw data

Plotting: model predictions

Model assumptions

Assumptions checked:

  • Normality of residuals
  • Variance of residuals
  • Normality of residuals for the random effects

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.99765, p-value = 0.05214
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.87633, p-value = 0.3232

Supplementary analysis: advert level variations on informedness

Is there evidence to suggest that it is the aesthetic style and content of an advert itself that increases informedness about a source, and do digital imprints play a role in informing citizens above and beyond this?

Claims tested:

  • Informedness about the source will be increased by the presence of a digital imprint, even when accounting for variations in campaign material content and format.

To further explore this, we can conduct an analysis comparing the effect of viewing each campaign post with and without the inclusion of a digital imprint on persuasion knowledge, political goal recognition, and perceived informedness.

## `summarise()` has grouped output by 'advert'. You can override using the
## `.groups` argument.
Decriptive bivariate statistics for percieved persuasion knowledge
advert version n mean_pk sd_pk se_pk ci_upper ci_lower
advert.1 0 665 5.40 0.95 0.04 5.47 5.33
advert.1 1 657 5.51 0.98 0.04 5.58 5.43
advert.2 0 657 4.91 1.00 0.04 4.99 4.84
advert.2 1 665 5.04 1.01 0.04 5.12 4.97
advert.3 0 662 4.42 1.01 0.04 4.50 4.35
advert.3 1 660 4.59 1.02 0.04 4.67 4.51
advert.4 0 660 4.47 0.94 0.04 4.54 4.40
advert.4 1 662 4.47 0.98 0.04 4.54 4.39

## `summarise()` has grouped output by 'advert'. You can override using the
## `.groups` argument.
Decriptive bivariate statistics for percieved political goal
advert version n mean_pg sd_pg se_pg ci_upper ci_lower
advert.1 0 665 6.44 0.72 0.03 6.49 6.38
advert.1 1 657 6.45 0.74 0.03 6.51 6.39
advert.2 0 657 6.35 0.86 0.03 6.41 6.28
advert.2 1 665 6.32 0.86 0.03 6.39 6.25
advert.3 0 662 5.24 1.45 0.06 5.35 5.13
advert.3 1 660 5.43 1.36 0.05 5.53 5.32
advert.4 0 660 6.01 1.02 0.04 6.08 5.93
advert.4 1 662 6.02 1.05 0.04 6.10 5.94

## `summarise()` has grouped output by 'advert'. You can override using the
## `.groups` argument.
Decriptive bivariate statistics for percieved informedness
advert version n mean_in sd_in se_in ci_upper ci_lower
advert.1 0 665 4.12 1.51 0.06 4.24 4.01
advert.1 1 657 4.44 1.57 0.06 4.56 4.32
advert.2 0 657 3.69 1.59 0.06 3.82 3.57
advert.2 1 665 4.12 1.48 0.06 4.24 4.01
advert.3 0 662 3.75 1.49 0.06 3.86 3.63
advert.3 1 660 4.11 1.53 0.06 4.23 3.99
advert.4 0 660 4.62 1.55 0.06 4.74 4.51
advert.4 1 662 4.79 1.49 0.06 4.90 4.67

Supplementary analysis: digital imprint inclusion and percieved trustworthiness

Did the inclusion of a digital imprint influence how trustworthy and credible the posts were perceived to be?

Outcome: accurate/believable/factual/trustworthy, numerical scale 1-7 Fixed effects: Version of post viewed, binary Random effects: id and advert Control: agreement with post

The models below are created through a function

  • model_accurate_value <- lmer(accurate_value ~ version + agree + (1|id) + (1|advert), data = imprint_df)
  • model_believable_value <- lmer(believable_value ~ version + agree + (1|id) + (1|advert), data = imprint_df)
  • model_factual_value <- lmer(factual_value ~ version + agree + (1|id) + (1|advert), data = imprint_df)
  • model_trustworthy_value <- lmer(trustworthy_value ~ version + agree + (1|id) + (1|advert), data = imprint_df)
  Percieved accuracy
Predictors Estimates CI p
Intercept 0.93 0.79 – 1.06 <0.001
Digital Imprint included 0.07 0.03 – 0.12 0.002
Agreement 0.72 0.70 – 0.73 <0.001
Random Effects
σ2 0.70
τ00 id 0.13
τ00 advert 0.01
ICC 0.17
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.584 / 0.655
  Percieved believability
Predictors Estimates CI p
Intercept 1.29 1.08 – 1.50 <0.001
Digital Imprint included 0.06 0.01 – 0.11 0.013
Agreement 0.70 0.68 – 0.72 <0.001
Random Effects
σ2 0.83
τ00 id 0.24
τ00 advert 0.04
ICC 0.25
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.504 / 0.628
  Percieved factualness
Predictors Estimates CI p
Intercept 0.69 0.29 – 1.08 0.001
Digital Imprint included 0.06 -0.01 – 0.12 0.091
Agreement 0.56 0.53 – 0.59 <0.001
Random Effects
σ2 1.48
τ00 id 0.70
τ00 advert 0.14
ICC 0.36
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.238 / 0.514
  Percieved trustworthiness
Predictors Estimates CI p
Intercept 0.85 0.55 – 1.15 <0.001
Digital Imprint included 0.08 0.04 – 0.13 0.001
Agreement 0.61 0.59 – 0.63 <0.001
Random Effects
σ2 0.81
τ00 id 0.30
τ00 advert 0.08
ICC 0.32
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.421 / 0.607

Checking assumptions for trustworthy:

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.99474, p-value = 0.0001368
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.9701, p-value = 0.8421

Hypothesis 2a: outcome: correct name recall

Did participants recall a higher number of correct campaigner names when they are trained to pay attention to the source?

  • Outcome: name_correct, numerical 0-3
  • Predictor: Training.condition

Model

correct_name <- lm(name_correct ~ Training.condition, data = training_df)

Table

  Number of names correctly recalled
Predictors Estimates CI p
Intercept 2.10 2.01 – 2.18 <0.001
Trained 0.18 0.06 – 0.29 0.004
Observations 1322
R2 / R2 adjusted 0.006 / 0.006

Plotting: Raw data

Plotting: Model predictions

Assumptions

As there is only 1 binary predictor, there is no need to assess linearity as it only assumes a difference in total means for each group. Assessed is:

  • Normality of residuals
  • Equal variance of residuals

Hypothesis 2b: outcome: percieved informedness

Were trained participants more likely to correctly identify that they were less informed when a digital imprint was not present, and more informed when a digital imprint was present, compared to the group who received no training?

Model

  • Outcome: perceived informedness, numerical 1-7
  • fixed effect: training condition x version of imprint viewed (interaction effect)
  • random effects: id and advert
inform_training <- lmer(informed ~ Training.condition + version + Training.condition*version + (1|id) + (1|advert), data = imprint_df)

Table

  Perceived Informedness
Predictors Estimates CI p
Intercept 4.08 3.71 – 4.46 <0.001
Training -0.07 -0.20 – 0.06 0.296
Digital Imprint included 0.21 0.11 – 0.31 <0.001
Training*imprint 0.22 0.08 – 0.36 0.002
Random Effects
σ2 1.71
τ00 id 0.62
τ00 advert 0.14
ICC 0.31
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.011 / 0.313

Plotting: Raw data

## `summarise()` has grouped output by 'Training.condition'. You can override
## using the `.groups` argument.
Decriptive bivariate statistics for percieved informedness
Training.condition version n mean_informed sd_informed se_informed ci_upper ci_lower
0 0 1318 4.08 1.56 0.04 4.17 4.00
0 1 1318 4.29 1.54 0.04 4.37 4.20
1 0 1326 4.01 1.60 0.04 4.10 3.92
1 1 1326 4.44 1.54 0.04 4.53 4.36

Plotting: Model predictions

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

Assumptions

The following assumptions are checked:

  • Normality of residuals
  • Variance of residuals
  • Normality of residuals within the random effects

There are no key assumption violations, supporting that the model predictions are reliable.

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.99765, p-value = 0.05214
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.87633, p-value = 0.3232

Hypothesis 3: outcome: trustworthiness

Did those informed about the purpose of imprints use their absence/presence to evaluate the trustworthiness and credibility of the posts?

  • Outcome: accurate/factual/believable/trustworthy, numerical 1-7
  • fixed effect: training condition x version of imprint viewed (interaction effect)
  • random effects: id and advert
  • control: agreement with post

Models

Four models are created through the use of a function for each of the outcome variables, using the following forumula:

model <- lmer(outcome ~ Training.condition + version + agree_value + Training.condition*version + (1|id) + (1|advert))

Tables

  Perceived trustworthiness
Predictors Estimates CI p
Intercept 0.87 0.57 – 1.18 <0.001
Training -0.05 -0.14 – 0.04 0.266
Digital Imprint included 0.07 -0.00 – 0.14 0.057
Agreement 0.61 0.59 – 0.63 <0.001
Training*imprint 0.03 -0.06 – 0.13 0.481
Random Effects
σ2 0.81
τ00 id 0.30
τ00 advert 0.08
ICC 0.32
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.421 / 0.607
  Perceived believability
Predictors Estimates CI p
Intercept 1.28 1.07 – 1.50 <0.001
Training 0.01 -0.08 – 0.10 0.849
Digital Imprint included 0.06 -0.01 – 0.13 0.104
Agreement 0.70 0.68 – 0.72 <0.001
Training*imprint 0.01 -0.09 – 0.11 0.851
Random Effects
σ2 0.83
τ00 id 0.24
τ00 advert 0.04
ICC 0.25
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.504 / 0.628
  Perceived accuracy
Predictors Estimates CI p
Intercept 0.93 0.79 – 1.07 <0.001
Training -0.01 -0.08 – 0.07 0.802
Digital Imprint included 0.05 -0.02 – 0.11 0.138
Agreement 0.72 0.70 – 0.73 <0.001
Training*imprint 0.05 -0.04 – 0.14 0.297
Random Effects
σ2 0.70
τ00 id 0.13
τ00 advert 0.01
ICC 0.17
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.584 / 0.655
  Perceived fact versus opinion
Predictors Estimates CI p
Intercept 0.69 0.29 – 1.10 0.001
Training -0.01 -0.14 – 0.12 0.849
Digital Imprint included 0.02 -0.07 – 0.12 0.603
Agreement 0.56 0.53 – 0.59 <0.001
Training*imprint 0.06 -0.07 – 0.19 0.343
Random Effects
σ2 1.48
τ00 id 0.70
τ00 advert 0.14
ICC 0.36
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.238 / 0.514

Plotting: raw data

Plotting: model predictions

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

Assumptions

Each model is checked for the following assumptions:

  • normality of residuals
  • equal variance of residuals
  • normal distribution of residuals for the random effects

As can be seen, the assumptions are not met for many of the models, creating a need for robustness checks. Additionally, in the supplementary check for the effect of digital imprints on trustworthiness, the effect of imprints is small but positive and significant across the outcomes: trustworthiness, accuracy and believability. This significance is not maintained in the models tested for hypothesis 3 that include the training condition and an interaction effect. Comparisons of model fit between these models suggest they are similar, but this stresses the need for caution when interpreting these models and additional checks.

Trustworthy model:

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.99468, p-value = 0.0001214
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.97023, p-value = 0.8429

Believable model:

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.98784, p-value = 4.752e-09
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.87462, p-value = 0.3162

Accurate model:

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.9901, p-value = 8.823e-08
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.9477, p-value = 0.7018

Factual model:

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.98918, p-value = 2.582e-08
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.79159, p-value = 0.08797

Hypothesis 4: outcome: confidence in regulation

Does being informed explicitly about the purpose of digital imprints and their relation to regulatory compliance increase perceptions that political advertising is sufficiently regulated in the UK?

This model uses the training_df dataframe.

  • Outcome: confidence in regulation, numerical 1-7
  • Predictor: training condition

Assumptions of normality and equal variance of residuals are violated, due to the skewed distribution of the outcome variable. A robust standard errors model is fitted as a robustness check, and the same result is found, increasing confidence in the reliability of this estimate.

Model

regulation_model <- lm(election_reg ~ Training.condition, data = training_df)

Table

  Perceived sufficiency of advertising regulation
Predictors Estimates CI p
Intercept 3.01 2.90 – 3.12 <0.001
Trained -0.11 -0.27 – 0.04 0.150
Observations 1322
R2 / R2 adjusted 0.002 / 0.001

Plotting: raw data

Plotting: model predictions

Model assumptions